Modelling parametric uncertainty in large-scale stratigraphic simulations

We combine forward stratigraphic models with a suite of uncertainty quantification and stochastic model calibration algorithms for the characterization of sedimentary successions in large scale systems. The analysis focuses on the information value provided by a probabilistic approach in the modelling of large-scale sedimentary basins. Stratigraphic forward models (SFMs) require a large number of input parameters usually affected by uncertainty. Thus, model calibration requires considerable time both in terms of human and computational resources, an issue currently limiting the applications of SFMs. Our work tackles this issue through the combination of sensitivity analysis, model reduction techniques and machine learning-based optimization algorithms. We first employ a two-step parameter screening procedure to identify relevant parameters and their assumed probability distributions. After selecting a restricted set of important parameters these are calibrated against available information, i.e., the depth of interpreted stratigraphic surfaces. Because of the large costs associated with SFM simulations, probability distributions of model parameters and outputs are obtained through a data driven reduced complexity model. Our study demonstrates the numerical approaches by considering a portion of the Porcupine Basin, Ireland. Results of the analysis are postprocessed to assess (i) the uncertainty and practical identifiability of model parameters given a set of observations, (ii) spatial distribution of lithologies. We analyse here the occurrences of sand bodies pinching against the continental slope, these systems likely resulting from gravity driven processes in deep sea environment.


Geological setup
The Porcupine Basin is a large N-S oriented, sediment underfilled, offshore basin, located approximately 150 km to the west of Ireland. It is bordered to the north and west by the Porcupine High, to the east by the Irish western shelf, to the south by the Goban Spur, and to the west it opens to the Atlantic (Fig. 1a). The basin is approximately 300 km long and widens southwards up to a width of 200 km and it is considered an aborted early attempt of the northern Atlantic Ocean to open. Present-day water depths range from several hundred meters on the flanking shelves to more than 3000 m in the central and southern parts of the basin 19 . There is no evidence of oceanic crust at the basin floor except for the southern extremity 20,21 . The basin infilling history started during Jurassic and it is still developing 22,23 . The present-day basin configuration is the result of a geologic evolution started in Middle to Late Jurassic with a major rifting phase 25 , possibly continuing into the Early Cretaceous in a so-called "transition phase" 17,26,27 (Fig. 1b). A transition to thermally-controlled subsidence accommodated a 4-6 km thick Cretaceous succession of largely deep-water sediments 24,28,29 that draped and onlapped the earlier fault-controlled depocenters. Although the Early Cretaceous is generally considered a post-rift thermal subsidence phase during which deep-water conditions prevailed, the Lower Cretaceous succession contains numerous unconformities, some of which are associated with periods of increased clastic input 28 . Following the rifting phase, during which the basin is characterized by a dissected and rapidly evolving topography, in the Early Cretaceous subsidence became more uniform across the whole Porcupine, this evolutionary step being usually termed "sag" phase 21 .
Our study simulates the basin evolution from the Base Cretaceous Unconformity (BCU) to present day. A key objective is to characterize sediment distributions spanning from the BCU, which marks the end of the main Middle/Late Jurassic rifting phase, to the top of a regional scale chalk unit which dates lower Paleocene (Fig. 1b,c). The occurrence of possible clastic stratigraphic traps pinching against the southern border of the basin can be assumed from available seismic interpretations (Fig. 1c), but their location remains uncertain. Our aim is to better constrain the possible spatial location of such sandstone bodies (Fig. S1).

Methods
Overview and general workflow. Our numerical procedure is based on a series of four steps (see Fig. 2): 1. Definition of an initial geological model featuring a given model structure together with an initial guess of the model parameter variability ranges. Our model setup is here based on the Dionisos modelling framework 9,13 . www.nature.com/scientificreports/ 2. The screening step features a global sensitivity analysis whose aim is to diagnose the response of model outputs to uncertain input parameters based on the elementary effects method 31,32 . Then, a Principal Component Analysis (PCA) is performed to rank parameters importance 7,33 . As a result we discard parameters displaying a negligible influence on model outputs and we revise the initially defined parameters ranges. 3. Model reduction techniques are implemented to reduce the computational cost required to predict the key outputs of the problem. Surrogate models are typically constructed using data-driven approach via the response of the full simulator to selected locations in the parameter space 34,35 . Here, the surrogate model is built on the Polynomial Chaos Expansion (PCE) technique with a reduced number of parameters selected through the screening step 36-38 . 4. Stochastic inverse modelling is performed to estimate input parameters with available data. Here a Particle Swarm Optimization (PSO) technique is leveraged in a probabilistic framework [39][40][41] . Millions of parameter realizations are tested to select a sample of optimal parameters combinations, so the use of the surrogate model is important for computational efficiency.  The outcome of the procedure is a sample of model realizations, which, in turn, yields empirical probability distributions for both input parameter values considered optimal to fit the available data and output quantities of interest. The final results can then be compared with available geological interpretations and other soft data, similar to the ones presented in Fig. 1c. With respect to the method presented by Patani et al. 7 , we present a significant difference in the parameters screening step which is completed and reinforced by additional criteria and analysis (Section "Range analysis"), together with the probabilistic interpretation of model results in the context of risk-assessment analysis (see Section "Identification of geological features across the calibrated sample").
Dionisos model. Dionisos is a process-based modelling tool designed to perform three-dimensional numerical simulations of basin-scale stratigraphic reconstructions 13 . Full model forward simulations were here performed with Dionisos, version 4.93. Dionisos requires specifying parameters related to physical, biological and geochemical processes including accommodation, production, erosion and transport of sediments. The simulations are performed in a user-defined time interval in a sequence of time steps. At each time step, three main tasks are performed: • Definition of accommodation space for the sedimentation and its variation with time as a consequence of basin deformation due to eustasy, compaction or flexure; • Definition of sediment supply which may correspond to a source or an in-situ marine carbonate production; • Simulation of sediment transport using deterministic laws and the mass balance principle combined with the diffusion equation.
Transport of sediments is modelled by hillslope creeping and fluvial transport 9 . A planar two-dimensional sediment flux Q sed,i [km 2 /ky] ( i = X, Y , being the two directions of space, see Fig. 2) is evaluated by these two mechanisms Q sed,i = Q HC,i + Q FT,i , where are sediment discharges generated by hillslope creeping and fluvial transport, respectively, K s and K w are constant slope-and water-driven diffusion coefficients [km 2 /ky], S i is the local topographical gradient along the i-direction, Q w is a dimensionless water flux, and N q and N s are fixed coefficients. Equations (1) and (2) are coupled with mass conservation to model the transport process 13 where Z [m] is the depth with respect to a fixed refence plane.
For the study case presented here the outputs of Dionisos are the depth maps associated with a given age and volume fraction of various types of sediments (sand, silt, shale, carbonate mud and carbonate grains) in the considered geological area, as described in Section "Dionisos model".
The considered geological domain consists of N cells = 56 × 70 = 3920 cells where the grid size is 2 km, thus covering 15,680 km 2 . The simulation considers the interval 146-0 Ma with a time step equal to 1 My. Two lateral sources of sediment supply are imposed in a stationary position and width located in the southern and in the eastern boundaries of the model. Figure 3 depicts the planar geometry together with the bathymetry map at 146 Ma (measured with respect to the current sea level) and the two sediment supply sources locations. Paleobathymetries are obtained from regional stratigraphic and sedimentological studies based on the seismic data interpretation. (1)

Uncertain inputs and output quantities of interest.
We consider uncertainty in a set of 139 parameters grouped into four main categories, each corresponding to a specific process contributing to the stratigraphic simulation, i.e., (i) siliciclastic supply (68 parameters), (ii) carbonate production (3), (iii) transport and erosion (44), and (iv) eustasy and compaction (24). Siliciclastic supply from two sources (34 parameters in each) includes sediment and boundary fluvial discharges together with the volumetric fractions of sand, silt and shale. Carbonate mud production started at 90 Ma and was active until 66 Ma according to the expected basin stratigraphy (see Fig. 1b Fig. 1b). Our objective is to calibrate the Dionisos model using the data Z * = Z * n a j , n = 1 : N cells , j = 1 : N ages . This calibration is the used to infer distribution of sand bodies in the selected domain. Therefore, a key quantity that we will consider is the spatial distribution of the sand volumetric fraction within the computational domain. For a given cell n we define the average sand fraction during the interval a j = [a j , a j+1 ] as V sand = V sand,n a j , n = 1 : N cells , j = 1 : N ages − 1. This quantity describes the average sand fraction for each cell and during each one of the seven time intervals a j . There is no available direct measurement of the variable V sand , but only qualitative information of possible accumulation areas in the domain (see Fig. 1c). Therefore, the spatial distribution of V sand is not used to calibrate model parameters but is computed as a predicted quantity of interest.
Screening. Parameter screening is performed with a twofold aim: (i) identifying relevant parameters contributing to the selected goal (see Sections "Sensitivity analysis"-"Parameters selection") and (ii) evaluate the consistency between the assumed parameters intervals and the available data (see Section "Range analysis"). Both these steps are taken with a view to the parameter calibration step, to achieve a better identifiability of the calibrated parameters and at the same time to help maintaining a reasonable computational cost.
Sensitivity analysis. Sensitivity analysis is employed to rank parameters by quantifying their contribution to quantities of interest. We start here by considering a total number of parameters equal to N , where we set N = 139 in the Porcupine test case, as described in Section "Dionisos model". To this end the Morris indices are where EE i is the elementary effect associated with the variation of the i-th parameter, is a given increment in the parameter space, p N is the parameter vector and y represents a scalar output corresponding either to Z n (a j ) or V sand,n (�a j ) in a given cell of the domain. Elementary effects thus represent a discretization of the gradient of an output y with respect to one parameter at a time, while other parameters are fixed. We design M trajectories across the parameter space where an appropriate value of M can be chosen depending on the number of parameters 32 . The trajectories are built upon subdividing each selected parameter range in N levels which are equally spaced between the predefined minimum and maximum value of each parameter. The sampling is performed by changing a single parameter value between two subsequent points along a trajectory. Each node of the trajectory corresponds to a parameter realization for which a Dionisos simulation is performed thus providing the outputs of interest. Every trajectory allows then to evaluate an elementary effect for each parameter. After evaluation of the elementary effect EE i,k for each trajectory k = 1 : M and for each parameter p i , i = 1 : N , we consider the following indices: These indices represent the mean (Eq. (5)), the standard deviation (Eq. (6)) and the mean of the absolute values of EE s (Eq. (7)). These indices are calculated at each cell of the domain at every age in a for variables Z and V sand . The total number of simulations required by the procedure is N real = M · (N + 1).
Parameters selection. The screening procedure consists of two steps leading to successive reduction of the number of parameters. In the first step, we aim to discard the parameters that have negligible influence on the model outputs. To this end, for each age and output variable, we average the indices µ * and σ over the cells of the domain where y n refers to the output Z a j or V sand a j in cell n . Quantity σ avg is calculated according to the law of total variance and Var represents the variance of µ(p i , y n ) over the cells of the domain.
We then define a reasonable threshold thr Z and thr S for Z and V sand , respectively. Then at every age in a we select the parameters for which µ * avg or σ avg is greater than these thresholds. We thus obtain two p Z j and p S j vectors containing the parameters having an impact for the simulation of Z(a j ) and V sand (�a j ).
After selecting the important parameters for every model output, the first phase is finalized by taking the union of the sets of the important parameters for each output and for each age: The first step of the screening aims to neglect the parameters whose contribution is negligible on the whole set of outputs of interest, thus considering all the parameters contributing to the outputs in the next steps of the procedure. Vector p sel will then be a vector of dimension N sel < N.
The second step is performed with the reduced set of parameters p sel . Using this reduced parameter vector we repeat the sampling procedure introduced in Section "Sensitivity analysis" and thus obtain a new set of parameters realizations and the corresponding outputs evaluated via simulation through the Dionisos model. We then apply a Principal Component Analysis (PCA) 33 to a matrix J whose components are the Morris indices µ defined in Eq. (5). More in detail we define the matrix J as www.nature.com/scientificreports/ which gathers all the sensitivity indices for each parameter, cell locations and ages, and where R is the number of individual output variable evaluations in each model output (for instance, for output Z , we consider a value for each cell for each simulated age, i.e., R = N ages × N cells ). Matrix J can be considered as an approximation of the Jacobian matrix reporting the average value of the derivative of outputs with respect to input parameters. The contribution of each parameter to Z and V sand is then evaluated based on the eigenvalues of the matrix J T J (for more details see 7 ) where j are the eigenvalues of the matrix J T J . The first N rdc parameters according to this ranking are selected for Z and V sand , and we then consider the union of these two parameters set.
Range analysis. Parameter ranges are initially fixed a priori from the literature, geological proprietary measurements and expert opinion. To fix the range the values p i,min , p i,max are given. Considering the large number of parameters this task can be very challenging especially for complex geological domains. To assist the modelling, we propose here a quantitative method to assess the consistency between the selected parameter intervals and available observations.
To this end each parameter realization employed in the screening process is ranked according to the following objective function J k where Z n p k , a j is the depth calculated by Dionisos simulation at cell n , age a j and for the k th realization of the parameters p k . Smaller values of J k imply a smaller difference between data and model results. Thus, we perform the following steps: 1. Rank each parameter realization according to the value of J k . 2. Select the realizations r k which lie in the n th percentile according to such ranking, thus obtaining the subset S n . 3. Compute the probability distributions of a given realization being in the set S n for each selected level and for each parameter, Quantity F n defines an empirical probability of finding a realization in the set S n conditional to a given parameter level, normalized by the absolute sample probability Pr(r k ∈ S n ) (this latter being equal to n/100 by definition). The distribution of F n (i, l) across the levels is used to validate or revise the range for every parameter p i . Notably, if the values attained by F n become large close to the interval boundaries this is interpreted as an indication that the fitness of the model may be increasing when the parameter approaches the value p i,min or p i,max . These bounds are then revised to increase or reduce the width of the considered range. Specifically, in this study we have employed N levels = 8 , thus we have analysed cumulative values of F n (i, l) at levels l = 1, 2 and l = 7, 8 to revise the lower and upper bound of the range of each parameter, respectively (see Section "Parameter screening results" for a numerical example). Note that different choices are possible for quantity N levels . Increasing N levels is providing a finer discretization of the parameter range, but at the same time decreases the sample size of realizations available for a given level, thus increasing statistical noise in evaluating Eq. (14). In our preliminary test we found N levels = 8 is a reasonable compromise between resolution and computational cost. The outlined procedure aims at identifying macroscopic trends indicating that some specific values of a model parameter may lead to a clear departure from the observed data, while other values indicate a behaviour closer to the data. We recall that in this step of the procedure the sampling is not meant at optimizing the objective function reported in Eq. (13), therefore we cannot draw any conclusion about final calibrated parameters values and the idea is to merely correct any range that shows an important misalignment with the data and to help modelers to deal with large number of input parameters. Range analysis is performed in both steps of the screening procedure presented in Section "Parameters selection".
Model reduction. Application of SFMs in the industrial context is hampered by the computational cost required for numerical simulations. Here, a surrogate model is employed to mimic the selected outputs of Dionisos model. The surrogate model is formulated through the generalized Polynomial Chaos Expansion (gPCE) technique which estimates the model outputs through a multi-dimensional polynomial formulation in terms of input parameters 42 . A given model output Z p (at every cell and age) is approximated as 37  13)) is calculated for parameter combination corresponding to each particle x j . Particle positions and speeds are updated according to the following expressions 39,44 .
where ω is the inertia weight, ϕ is called cognitive coefficient and r t is a random coefficient updated at each step t . The vector x t best,j identifies the parameter combination providing the lowest objective function value among those experienced by particle j , while g t best is the location of maximum fitness (minimum distance to data) ever discovered by all particles of the swarm. Both these locations are updated at each iteration t . Convergence of the algorithm is obtained when an optimum of the objective function is attained or a predefined number of displacements is reached. In this analysis convergence criterion is set as a number of displacements which is equal to T = 1000 which does not ensure the global minimum is attained but is a good compromise between the accuracy and the computational cost. At the final iteration we identify the best fit parameter value as p = g t=T best . This procedure is repeated for different random initial parameters combinations, thus yielding a sample of N calib of parameter values p 1 , . . . , p N calib , each corresponding to the solution obtained minimizing the distance between the surrogate model and the data. This parameter sample is then used to simulate a sample of N calib realizations of the full stratigraphic model through Dionisos, thus obtaining the probabilistic characterization of all the outputs of interest for the selected parameter values.

Results
Parameter screening results. For the first step of the screening 50 trajectories are generated with 139 parameters resulting in 7000 Dionisos forward simulations. Sensitivity indices are calculated for Z n (a j ) depth and V sand,n (�a j ) volume fraction at every age a j or interval a j and cell n using Eqs. (8) and (9). Thresholds are defined for Z and V sand as thr Z = 5m and thr S = 1% . These thresholds are considerably smaller than the range of variability of the model outputs and are below the accepted uncertainty in the outputs. Hence, at this step of the screening only the uninfluential parameters will be discarded.
To exemplify the results obtained, Fig. 4 shows µ * versus σ at 89 Ma for Z (a) and in the interval 89-66 Ma for V sand (b). Red lines represent the selected thresholds, while solid blue points represent the computed µ * avg and σ avg values. The parameters with µ * avg and σ avg falling in the lower-left quarter of the plots are deemed to be negligible for these specific outputs. For the outputs considered in Fig. 4, 105 and 103 parameters would be eliminated by considering separately Z and V sand , respectively. The error bars reported in Fig. 4 represent the variability of µ * in the spatial domain by indicating range of values comprised between the 5th and 95th percentile.
We exemplify the type of information that can be obtained from this analysis by studying the sensitivity of model outputs with respect to a single parameter related to carbonate production. The black point represents parameter #71. Figure 5 shows the spatial distribution of µ * for this parameter at 89 Ma for Z (a) and in the interval 89-66 Ma for V sand (b). Parameter #71 is carbonate mud production that is activated at 90 Ma in the chalk deposition phase (see Fig. 1b). Since parameter #71 activates carbonate muds accumulations at 90 Ma, it has negligible influence on V sand in the previous ages. However, this parameter can induce variations of the observed depth at 89 Ma up to 250 m on average in certain spatial locations, notably, in the southern region of the domain where carbonate deposits may accumulate in shallow marine environment (see Fig. 5a). Note also that the same parameter has a definite influence on V sand in the interval between 90 and 89 Ma. However, the spatial extent of the region where the parameter is important varies between the two outputs (compare Fig. 5a (15)  Fig. 5b). In particular, the impact of carbonate production extends to a large portion of the domain when V sand is considered, with an important effect observed in the south-western corner of the domain. The final parameters set p sel selected at this first screening step consists of 97 parameters neglecting 42 parameters out of 139. Figure 6 shows the distribution of the 97 selected parameters (solid bars) over all initial set (transparent bars) expressing different processes contributing to the overall sedimentological output. We observe that parameters associated with supply location and width and carbonate production are all considered important. On the other hand, slope failure parameters are always negligible, thus indicating such process is not contributing significantly to the selected outputs. Information like the one gathered here can assist modellers in identifying which processes and parameters are worth further investigation (for example through dedicated analyses of literature and proprietary data) and which can be conversely neglected.
Following the procedure outlined in Section "Range analysis" we analyse the investigated parameters intervals. Figure 7 exemplifies two results of this procedure. Each bar in the plot represents the probability F n (i, l) to find a realization in the set S n at the considered level where n = 5 . Red crossed lines indicate the preliminary intervals while the green crossed lines correspond to the new intervals bounds. In Fig. 7a we observe that the distribution of parameter #71 ( F n (i = 71, l) , associated with maximum carbonate mud production at 90 Ma [m/My] is concentrated close to the right boundary. Hence, the parameter range is enlarged from the interval [5,25] to [5,35]. Figure 7b depicts the distribution F n (i = 107, l) , for parameter #107 which is slope driven marine silt transport coefficient [km 2 /Ky]. The total frequency F n is 2% in the two levels associated with the top 25% of the interval (corresponding with the last two levels). As a result, the procedure proposes a reduction of the upper bound of the interval from 1 to 0.75. It is important to note that range modification should be performed according to the physical and geological meaning of the parameters, therefore parameter modifications are finalized only after human intervention, i.e., after checking that the new proposed boundaries are not exceeding physical or geological meaning associated with given quantities (e.g., transport coefficients cannot assume negative values).  Figure 8 shows the contribution defined in Eq. (12) of each selected parameter to the quantities of interest. We observe that the some of the selected parameters (#19, 20, 53, 54, 55, 58, 59) have negligible influence on Z , but are retained in the parameters list since they show a relevant impact on V sand . Cumulative contribution of these 20 parameters to the Z and V sand variability is 92.47% and 76.79%, respectively. This result is considered a good compromise in the model reduction process. The parameters selected at the end of the complete screening procedure are displayed in Table 1 together with their units and the final ranges of variability, obtained investigating parameter intervals as in the first screening step.

Identifiability of model parameters.
We present here the results of model calibration, performed following the procedure outlined in Section "Stochastic inverse modelling". The calibration phase involves approximately 50 million evaluations. To perform the stochastic calibration the surrogate model is used as explained in Section "Model reduction". This enables the procedure to be computationally affordable compared to the Dionisos model, considering that a full model run requires more than one hour while the surrogate model is associated with a computational cost of about 1 s. The accuracy of the surrogate model in mimicking the full Dionisos simulations is discussed in supplementary materials.   Table 1. www.nature.com/scientificreports/ can also be considered identifiable, because the estimates are close to a physical limit for these parameters. For example, parameter 6 is estimated close to 0. This parameter physically corresponds to a solid discharge that cannot attain negative values, thus indicating that the parameter best estimate can be identified and is close to zero. • Non-identifiable parameters: these parameters show approximately uniform distributions across the investigated range (see parameters #20, 53, 54, 83, 139 in Fig. 9). The procedure thus indicates that the estimation process is essentially insensitive to these quantities. It is important to note that these parameters are those exhibiting negligible contribution to Z , which is the quantity employed for calibration (see results in Fig. 8) and these results confirm the expected negligible influence on depth quantities. These results indicate that additional information is needed to identify these parameters, e.g., well data reporting an estimate of the volume fractions associated with various lithologies. • Parameters requiring further investigation: some of the marginal distributions reported in Fig. 9 appear to concentrate near the boundary of the parameter intervals (# 9, 19, 43, 55, 58, 59). In general, such a result indicates that these parameters require further investigation, as their optimal range may be found outside the one initially selected for the estimation.
Close observation of the results given by the parameter estimation can yield interesting indications to interpret the geological setting in the Porcupine area. Generally, sediment inputs are predominantly entering the domain through the eastern source (source 2). However, a possible southern source (source 1) is included in the model. Estimates of the solid discharge of source 1 at 113 Ma (parameter 6) concentrate around 0, while the solid discharge at the same source is estimated close to the maximum allowed value of 150 km 3 /My at 47 Ma (parameter 9). This indicates the sediment delivery from the southern source (source 1, see Fig. 3) may have increased in recent ages while being relatively negligible at older ages. Note that the interval bounds originally assigned to parameters 6 and 9 favoured an opposite behaviour (larger values were allowed for parameter 6 if compared to parameter 9 in the initial ranges). Information gained from the parameter calibration process can then be used in relation to existing paleoclimatic, and paleogeographic models from which sediment supplies are usually assumed. Results could also be used in the context of so-called source-to-sink analysis to assess possible location and extent of past sediment sources and fairways. These parameters are typically plagued by considerable uncertainty and are related to larger spatial scales than the one considered to model a single sedimentary environment. Therefore, quantitative information like the one provided are relevant to future efforts aimed at improving the geological characterization of basin fills, such as the Porcupine basin's one. Figure 9 also reports updates of the parameter ranges resulting from the adopted criteria. These corrections mostly proved to be useful, although they did not avoid the occurrence of marginal distributions concentrating on the boundary of the intervals (see for example the distribution obtained for parameter 6). In most cases the reduction of the range led to the desired result, allowing to neglect portions of parameter intervals which are far from the optimal ones (see, e.g., parameter #37,39,107 in Fig. 9). Yet, in some cases reducing the parameter range appears unjustified in light of the results presented in Fig. 9. Neglecting portions of the parameter space 6   www.nature.com/scientificreports/ may lead to an incomplete assessment of the parametric uncertainty, see e.g., the marginal distributions of parameters 2 and 43. This result suggests that modifications and particularly reduction of the considered ranges should be performed with care.
Identification of geological features across the calibrated sample. The analysis of the seismic data collected from Porcupine basin shows the possible occurrence of sand bodies lying between the horizons corresponding at 129 and 113 Ma, i.e. the sag phase (see Fig. 1b,c). These sands are interpreted as being deposited in a submarine fan environment by subaqueous sediment gravity flows [45][46][47] . They could have been triggered by slope failures or by huge riverine floods entering sea water and funnelled along the slope as hyperpycnal flows 48,49 . Based on qualitative interpretations (see Fig. 1c) these deposits may show a pinch-out geometry towards the base of the continental slope, which is indicative of a possible stratigraphic trap that may hold hydrocarbons 50 .
Pinch-outs are formed where the slope angle considerably decreases, allowing for sediment deposition. At seismic scale, vertical and lateral stacking of several depositional events generate large scale pinch-out geometries. During the assessment of the stratigraphic traps through forward modelling, it can be very challenging to locate the feeding channel, determine the accumulation potential of the sediments and detect the geometry of reservoir traps 49 . Our aim is here to demonstrate how these features can be identified upon relying on our probabilistic approach. First, we consider the probability of observing sand accumulations, which can be expressed by the exceedance probability to observe a volumetric fraction of sand larger than a given threshold. Figure 10a depicts the exceedance probabilities Pr(V sand > 0.2) , which is based on the relative frequency of V sand predicted using the N calib realization of the full model. Figure 10b shows a diagonal cross section of the domain together with the seismic data Z * surfaces drawn as white layers. Black layers represent the surfaces corresponding to the horizons corresponding at 129 and 113 Ma. The pinch-out is detected at the region where these two surfaces are close to each other. Figure 10b shows that the probability to find V sand > 20% in the pinch-out region is close to 20% over the 500 simulations.
To provide a more quantitative assessment of the sand accumulation downstream the pinch-out we have performed the analysis in two steps: (i) we identify a pinch-out location, by considering locations where the sediment thickness accumulated in the sag phase (between 129 and 113 Ma, Z 129 113 ) becomes smaller than 50 m, (ii) we calculate the total volume of the sand in the region downstream (i.e., north-west with respect to the pinch-out line) for 500 simulations, the region of interest being indicated by black dots in Fig. 11a. Figure 11b shows that the accumulation of sand in this area is predicted only by 123 out of 500 simulations.
With the aim of characterizing the geometry of these possible sand accumulations, we present the spatial distribution of the exceedance probability Pr(V sand > 0.2) between 129 and 113 Ma in the restricted sample of the 123 realizations where sand accumulation is observed. Figure 12 identifies two sand bodies in the domain which can be interpreted as fan lobes. The probability map suggests that these two bodies can coexist as separate fans or together as one fan system. These results are significant in the context of exploration activities related to energy resources, e.g., exploration well placement, and to identify likely locations of stratigraphic traps for hydrocarbon accumulations. Our probabilistic approach can be used to propagate uncertainty to subsequent modelling steps such as petroleum system modelling. These latter typically require incorporating source locations and migration pathways and therefore need to consider an even larger scale than the one considered by stratigraphic models. Therefore, results such as the ones presented in Fig. 12 can be extremely useful to inform migration models on the local features of the system. In this context, our approach can provide probabilistic inputs for subsequent modelling analyses.

Discussion and conclusions
We apply a probabilistic methodology to support quantitative geological interpretation through stratigraphic models in a real sedimentary setting. The Porcupine basin is considered as a test case for the application. The results obtained allow drawing general conclusions regarding the information value that can be gained from the application of the proposed method to a real scenario. Our study leads to the following conclusions: 1. Our procedure seamlessly integrates expert knowledge with reproducible quantitative indicators to reduce parameterization complexity of stratigraphic forward models. For the Porcupine basin case the uncertain parameter space is reduced from 139 to 20 dimensions, based on sensitivity metrics. The approach informs about the relevant contribution of individual processes and parameters to key targets of interests (e.g., horizon depths or sand accumulation). This screening step is complemented with a preliminary evaluation of the suitability of the selected parameter ranges in mimicking available observations. This information is then used to revise or corroborate expert opinion on the selected variability intervals for model parameters. 2. The probability distribution of calibrated parameters is obtained via an iterative procedure leveraging reduced order numerical model and a particle swarm optimization algorithm. We obtain an assessment of the practical identifiability of model parameters, this concept being associated with the possibility to reliably estimate parameters given a set of observations. Identifiability is assessed through a statistical postprocessing of the probability distributions associated with estimated parameters. 3. Our results lead to further insight on local sediment source dynamics. Sediment supplies in the Porcupine area are dominated by the eastern source whose discharge values are well identified by the proposed procedure. Our results indicate sediment inputs from southern sources are likely to be negligible at older times (113 Ma) while increasing to values exceeding the initial range at more recent ages (from 47 Ma).  www.nature.com/scientificreports/ 4. Postprocessing of the results allows for identification of upcurrent pinching fan lobes in the so-called sag phase comprised between 129 and 113 Ma. We identify the location of sand accumulations, indicative of the occurrence of possible stratigraphic traps. This analysis provides a quantitative counterpart to qualitative interpretations (such as the one presented in Fig. 1). This analysis is conducted in a probabilistic framework where we identify and delineate volumes associated with sand fraction exceeding a given threshold. Probabilistic results can be readily used in the context of risk assessment procedures for subsurface resources exploration.

Data availability
The data used to generate the results of this study are available upon reasonable request to the corresponding author.